19th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 THE DUPLEX-THEORY OF LOCALIZATION INVESTIGATED UNDER NATURAL CONDITIONS
ثبت نشده
چکیده
The duplex theory postulates that low-frequency tones are localized on the basis of interaural time differences (ITDs) while tones above 1.5 kHz are localized by evaluating interaural level differences (ILDs) [1]. Recent research showed that ITDs are also dominant for wide-band sounds [2]. The contribution of envelope-ITDs is supposedly small, but enhanced by temporal modulation [3]. Using tones, previous studies investigated the sensitivity to isolated binaural cues or the dependence of image location on traded cues [4]. The current study investigated lateralization for conflicting, but natural ILDs and ITDs by modifying head-related transfer functions. Preliminary results of 3 subjects showed clear ITD-dominance for almost all tested natural and synthetic sounds. Surprisingly, ITDs dominated for flute sounds while for a 2 kHzhigh-pass noise both cues received almost equal weights. The highest relative ILD-weighting was seen when ILDs and ITDs were out of their natural combination, even before two separate images were heard. In case of image splits one image was mostly ILD, the other ITD dominated. Images splits of up to 60% across all tested conditions were reported for a short melody played on a flute while few splits were heard for a single flute tone suggesting that temporal modulation increases the tendency for grouping into two objects. Binaural cue weighting showed large variability between subjects. INTRODUCTION The auditory system analyses sounds for their spectral content and temporal information and groups components also with respect to binaural information. In natural situations, binaural cues of interaural time and intensity are present by the relationships provided in head-related transfer functions (HRTFs). The plausibility hypothesis suggests that the auditory system uses the cue that provides a maximum of reliable information [5]. The weighting of this information will be dependent on the stimulus and it might change if HRTFs are used or if lateralization studies are done with unnatural cue combinations. The purpose of the present study was to investigate binaural cue weighting with near-natural binaural cues by using individually-adapted HRTFs. HRTFs were manipulated so that ITDs stemmed from one direction while ILDs and spectral cues were in accord with another direction. Lateralization was investigated and subjective reports of hearing one or two images were recorded. While previous studies used a small set of synthetic stimuli the focus of the present study was to expand the view to a wide array of natural and synthetic stimuli. Specifically, we were interested in how binaural cue weighting changes for various degrees of discordance of binaural cues. The manuscript presents results from a preliminary evaluation of 3 out of 5 subjects. METHODS Virtual acoustics Relative weighting of binaural cues was studied by varying ITDs and ILDs in virtual acoustical space. In a pre-test subjects selected a set of HRTFs from a catalogue of non-individual HRTFs. The selection procedure identifies a set of HRTFs for each subject that improves performance with respect to localization error, variance, front-back-confusions, and inside-the-head localization [6]. By means of the Fast-Fourier-Transformation (FFT) HRTFs from different originating directions are split into their amplitude and phase components separately for each ear. Amplitude components from one direction are then recombined with phase components from another direction, so that ILDs come from, e.g., +30° while ITDs might stem from -60° [7]. A Gaussian window with 4.4dB damping at the sides was applied to the impulse response of the recombined HRTF to reduce sharp windowing effects introduced by larger phase shifts. Sound stimuli were convolved with the resulting head-related impulse responses (HRIRs) to yield a virtual acoustics stimulus. Stimuli were presented through a calibrated, diffuse-field equalized Sennheiser HD580 headphone. Lateralization method (+1.0) Right Ear Left Ear (−1.0) Figure 1: Line dissection method: The line was projected on a screen in 2m distance in front of the subject. The bar could be adjusted with a trackball to the position of the lateralized sound image. The selection procedure individually adapted HRTFs to present natural values of binaural cues. This resulted in externalization of most stimuli presented with unprocessed HRTFs which contained congruent binaural cues. Most conditions, however, involved stimuli with discordant ITDs and ILDs which were perceived within the head. We decided to use a line-dissection method to assess lateralization of the sound image within the head. If stimuli were perceived outside the head subjects were instructed to project their location onto the interaural axis. The line-dissection method is pictured in Figure 1. The white line was projected on an otherwise black screen in 2m distance in front of the subject. It covered a visual angle of approx. ±25° and the endpoints were marked with vertical bars and the words “left ear” or “right ear”. The subject adjusted a red bar to the perceived lateralized sound position with a trackball. For the data analysis in this manuscript the left ear position is assigned -1 and the right ear +1. Stimuli Eight different stimuli that should give rise to different weightings of binaural cues were used in the experiments. Focus was set to include natural stimuli with synthetic counterparts. Table 1 gives an overview of stimulus type, bandwidth, envelope structure, duration, level as well as the predicted provision of localization cues according to the duplex theory and its recent extensions. Table 1: Overview of stimuli and their predicted provision of localization cues. Localization cues provision Stimulus Bandwidth [Hz] Envelope slopes [ms] Duration [ms] Level [dB SPL] ILDs CarrierITDs EnvelopeITDs Spectrum Burst 3ms 20-15000 0.5 3 65 ++ ++ ++ ++ WBN300ms 20-15000 20 300 55 ++ ++ ○ ++ LPN 1kHz 20-1000 20 300 60 ○ ++ ○ HPN 2kHz 2000 -15000 20 300 60 ++ ○ + HCT 200Hz 200-10000, CCITT-filter 20 300 55 ++ ++ ○ ++ “Shape” Speech 800 60 ++ ++ ++ ++ Flute 1 Note 440-13000 (20) 180 65 ++ ○ + ++ Flute 8 Notes 440-13000 (20) 1080 65 ++ ○ + ++ 1 WBN: Wide-band noise (Gaussian noise); LPN: Low-pass noise; HPN: High-pass noise; HCT: Harmonic complex tone. 2 Gaussian slopes applied to envelope, rise-time in ms. 3 Steady-state level of noise before envelope application; Overall level for other sounds. 4 CCITT-filter applied to simulate average spectrum of speech. Experimental Procedures and Subjects All combinations of ITDs and ILDs stemming from directions -60, -30, 0, +30, and +60° were tested with each of the eight stimuli. Ten trials were collected for each combination (5 ITDs * 5 ILDs * 10 trials * 8 stimuli = 2000 trials total). The 2000 trials were administered in random order 19 INTERNATIONAL CONGRESS ON ACOUSTICS – ICA 2007 MADRID 2 to prevent adaptation effects to discordant binaural cues. Level was roved randomly in 7 steps within ±6dB of the target sound level (Table 1). The presentation was divided into 18 runs. Each run took about 8 min to complete and subjects had to take short breaks between runs. Subjects received at least one training block prior to data collection. Training consisted of 2 runs of 100 trials each in which all direction combinations with all stimuli were presented once. In a single trial the sound was presented followed by a pause of 0.5 sec after which the adjustable bar appeared in the middle of the line. The subject moved the bar to the lateralized sound position and confirmed this by pressing a button on the trackball. The left button coded hearing a single image while subjects were instructed to press the right button if two or more images were perceived. The bar disappeared and after 0.5 sec the next sound was presented. The line was visible throughout the experiment. At the beginning of each experimental run five uncounted random trails were presented for accommodation. Experiments utilized the Simulated Open-Field Environment in the otherwise darkened anechoic chamber [8]. Results from three subjects are presented in this manuscript while the final study will include five subjects. One male and two female subjects participated, age 21, 22 and 30 years. All subjects had normal hearing thresholds within 300 Hz – 10 kHz as assessed with a Bèkèsytracking procedure. Subjects received a payment for their participation. The study protocol was approved by the ethics committee at the University of California at Berkeley. RESULTS AND DISCUSSION Figure 2 gives an overview of all subject responses for discrepant and coincident ITDand ILDdirections. Each row represents results for a different stimulus. Results for the 3 ms wide-band noise burst are given in the top row. Several studies have investigated localization cues for such stimuli and it is thought that ITDs dominate localization through being available in the phase at low frequencies and in the envelope. A pure ITD-dominance would be visible as a diagonal distribution of responses across all panels of a row in Figure 1. Clearly, for the WBN-burst this is not the case. Responses are coarsely aligned along the diagonal; however, for ILDs kept at -60° responses move only slightly to the centre when ITDs are changed to stem from 0° instead of -60° (upper left panel of Figure 1). When ITDs are moved further away than 0° responses split into two separate clusters, indicating perception of two separate images. Similar splits can be seen for most sounds, particularly if discrepancies between ITDand ILD-directions exceed 60°. Even if no splits are apparent, e.g. for ITDs from 0° and ILDs from +60°, the response distribution broadens and response histograms often already show a bimodal distribution. The second row of Figure 1 shows results from WBN with 300 ms duration. ITD-weighting should be smaller for this stimulus compared to the WBN-burst as envelope-ITDs are not as emphasized. Instead, ITDs might be weighted even more strongly as responses appear more compact, especially for larger ITD-ILD discrepancies in the middle panel (ILDs from 0°). Results for LPN are strongly ITD-dominated as expected. Responses for HPN, instead, are not purely ILD-dominated. Part of the responses shift according to the ILD-direction, but for large ITD-ILD-discrepancies image splits are visible, i.e. one image is dominated by ITDs, the other by ILDs. This relatively strong ITD-influence on HPN is surprising since envelope cues are not emphasized by modulation. Results for the other two wide-band stimuli, the HCT and the word “Shape”, are similar to those of the 300 ms WBN. The two flute-sounds also produced similar responses to the wide-band stimuli, but the audibility of split images differed. In the following we will evaluate ITDand ILD-dominance numerically. Figure 3 reports relative ITDand ILD-influence as well as the percentage of subjectively reported occurrence of two images. The ITD-ILD-weights were computed from the gradient to the surface of the means of subject responses in the ITD-ILD-plane, i.e. the means of the responses in Figure 2. The results in column 1 generally confirm the pattern of binaural cue weights described above. Particularly apparent is the stronger ITD-dominance for the longer duration WBN compared to the burst. Also apparent is that ITD-dominance is restricted to the diagonal – once discrepancies become too large ILD-influence becomes stronger. Eventually responses split in two images which 19 INTERNATIONAL CONGRESS ON ACOUSTICS – ICA 2007 MADRID 3 ITD−direction in degrees La te ra liz ed p os iti on left: −1 −0.5 0 +0.5 right: +1 ILD: −60deg 3m s B ur st ILD: −30deg ILD: 0 deg ILD: +30deg ILD: +60deg
منابع مشابه
19th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SPECTRAL CORRELATES OF CARRYING POWER IN SPEECH AND WESTERN LYRICAL SINGING ACCORDING TO ACOUSTIC AND PHONETIC FACTORS
In order to define the variability of carrying power (sometimes called “vocal effectiveness”) indexes in speech and singing, an acoustic analysis of vowels, sentences, singing exercises, and lyrical piece spoken and sung by 23 singers, was conducted. Two parameters were measured: (i) the difference in amplitude between the highest harmonic between 2 and 4 kHz and the one between 0 and 2 kHz ("S...
متن کاملA Semi-analytical Solution for Flexural Vibration of Micro Beams Based on the Strain Gradient Theory
In this paper, the flexural free vibrations of three dimensional micro beams are investigated based on strain gradient theory. The most general form of the strain gradient theory which contains five higher-order material constants has been applied to the micro beam to take the small-scale effects into account. Having considered the Euler-Bernoulli beam model, governing equations of motion are w...
متن کامل19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID , 2 - 7 SEPTEMBER 2007 PREDICTING LISTENERS ’ REPORTS OF ENVIRONMENTAL SOUNDS
Spontaneous verbal descriptions of environmental sounds lead to a description of the contributing sound sources and the environments in which they occur. This is a form of perception that relies crucially on the rich structure of sounds, because only rich sounds can convey detailed information about individual sources and the transmission environment. This paper uses a semantic network with con...
متن کامل19th International Leprosy Congress: the Plenary Sessions
The 19th International Leprosy Congress was held in Beijing, China during September 2016. The theme of the Congress was the ‘unfinished business’ for leprosy of stopping transmission, preventing disability and promoting inclusion. The Congress had many important sessions and included 4 Plenary Sessions focusing on the critical challenges facing leprosy today. The content and the key issues of t...
متن کامل19th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SPEECH INTELLIGIBILITY OF TWO GROUPS OF CORDECTOMIZED PATIENTS AFTER LARYNGOFISSURE AND LASER SURGERY
Speech intelligibility is inversely related to the noise generated in the vocal folds, in the resonance cavities, and in the environment. In this study the intelligibility of two cordectomized groups of patients, treated with two different surgical techniques, was analysed. One group underwent laryngofissure with conventional surgery; the other underwent surgery by laser. Each group recorded a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007